XTTS

🧠 XTTS in SkyrimNet — the Default-Quality TTS

XTTS (Cross-lingual Text-to-Speech) is a powerful, deep-learning-based TTS engine that brings realistic, emotionally expressive, and cloneable voices to Skyrim. Unlike simpler TTS engines, XTTS can replicate a specific voice from a short audio clip, making it ideal for immersive, character-specific dialogue in modded Skyrim.

In SkyrimNet, XTTS is used via a local HTTP endpoint, making it easy to integrate and fast enough for real-time use.
It’s currently considered the default voice generation system in SkyrimNet, especially for voice cloning good emotional fidelity. and low latency.

🎙️ What XTTS Does

XTTS converts any input text into high-quality, expressive speech — optionally mimicking a specific voice using a voice reference sample.

Input:
Text: "You're not from around here, are you?"
Voice sample: 10-second clip of a female Nord NPC

Output:
High-fidelity audio of that line, spoken in the same voice and tone as the sample

XTTS produces rich, natural speech, with subtle pauses, intonation, and personality — perfect for Skyrim’s varied characters.

🌐 How XTTS Works in SkyrimNet

XTTS is not currently embedded into SkyrimNet like Piper — instead, it runs as a separate local TTS service, typically on:

http://localhost:8002

Here’s how SkyrimNet uses it:

SkyrimNet sends a request to the XTTS server with:
- The text to speak
- Optional voice reference audio
- Optional speaker ID or emotion hints
XTTS returns a fully rendered WAV or PCM audio clip
SkyrimNet plays the audio in-game, synced with dialogue

This architecture keeps SkyrimNet lightweight while still offering powerful voice features via XTTS.

🧬 Key Features of XTTS in SkyrimNet

🎭 Voice Cloning: Easily assign unique voices to NPCs using short reference clips
🌍 Cross-lingual Support: Speak English in a French, Argonian, or Dunmer accent
🧠 Emotion Control (planned): Adjust mood and tone of delivery for immersive reactions
♻️ Reusable Voices: Store and reuse custom voices for followers, companions, or even the player

📦 XTTS vs Piper

Feature	Piper (In-Process)	XTTS (External API)
Speed	⚡ Very fast	⚠️ Slower (1–2s latency)
Voice Quality	✅ Good	✅✅ Excellent
Voice Cloning	❌ Not supported	✅ Full support
Integration	✅ Native DLL	🔌 HTTP endpoint

🚀 Why XTTS is SkyrimNet's Default Quality TTS

🎧 Offers the good audio realism
Natural cadence, clear articulation, and emotional depth — ideal for immersive dialogue.
🔁 Supports voice reuse and identity
Easily assign consistent voices to NPCs using short reference samples.
🧠 Enables AI-driven dialogue to feel grounded and believable
Dynamic lines generated by LLMs sound intentional, like a real voice actor spoke them.
💬 Works with any line — by input or LLM-generated — and makes it sound intentional
Perfect for branching narratives, roleplay mods, and reactive NPC behavior.

🧠 XTTS in SkyrimNet — the Default-Quality TTS​

🎙️ What XTTS Does​

🌐 How XTTS Works in SkyrimNet​

🧬 Key Features of XTTS in SkyrimNet​

📦 XTTS vs Piper​

🚀 Why XTTS is SkyrimNet's Default Quality TTS​